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Abstract 

Uncertainty around multimodel ensemble forecasts of changes in future climate reduces the accu- 
racy of those forecasts. For very uncertain forecasts this effect may mean that the forecasts should 
not be used. We investigate the use of the well-known Bayesian Information Criterion (BIC) to make 
the decision as to whether a forecast should be used or ignored. 

1 Introduction 

The climate predictions produced by numerical models predict changes in climate relative to the present 
day, rather than predicting absolute levels of future climate. The ratio r of the size of the change to the 
uncertainty around that change is then a measure of the confidence one can have in a forecast. It also 
determines whether or not it would be better to ignor e a forecast, use it, or adjust it in some way. We have 



studied this question in iJewson and Hawkins! ()2009cl ) , in which we compared the relative effectiveness of 



five different ways of using a multimodel numerical model climate forecast, as a function of this ratio r. 
We tested the different methods in terms of the mean squared error of predictions they would produce, 
using monte carlo simulations. We found that for an ensemble based on 10 truly independent climate 
models, for small values of this ratio (r < 0.74) it would be better to ignore the forecast completely. For 
larger values of this ratio (0.73 < r < 2.02) it is best to adjust the forecast towards zero, and for the 
largest values of this ratio (r > 2.02) it is best to use the forecast unadjusted. 

These results cannot, however, be used literally as a decision rule. This is because, in practice, we never 
know the exact value of the ratio r since it is defined in terms of the actual mean change in climate and 
the actual uncertainty around that change (which are unknown) rather than the estimated change and 
the estimated uncertainty (which are known). 

It is tempting to replace the real values of the mean and uncertainty with their estimates, calculate an 
estimate of r, and then use the ranges given above. However, rather than do that it may be better to 
treat this as a statistical model selection problem, and use standard model selection methods. Perhaps 
the most standard model select i on rul e is the Bayesian Information Criterion (BIC) (see many statistics 



textbooks, such as IWassermanI (|2004[ )1. which gives a way to select between models which have been 



fitted by maximum likelihood. Two of the five methods used in lJewson and Hawkind ()2009cl) were fitted 



using maximum likelihood (those that involve either ignoring or using the ensemble mean) with the others 
using a second stage of f itting for the 'damping' para meters. As a first step, therefore, we apply BIC 
to the two methods from iJewson and Hawkind (120093 ) that were fitted using maximum likelihood, and 



postpone to a future study the question of how to apply model selection to the remaining three methods. 
In section[2]we present the mathematical setup and the two different methods we will consider for turning 
multimodel ensemble output into a probabilistic prediction. In section [3] we derive the BIC for each of 
the methods. In section we compare the derived BIC values, and derive an expression for the critical 
value of r that determines the threshold between the two methods. In section [5] we apply BIC-based 
model selection to CMIP derived predictions of UK temperatures, and finally in section [6] we discuss the 
results. 
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Two Methods for Interpreting Multimodel Numerical Model 
Climate Forecasts 



Our mathematical setup follows that of iJewson and Hawkins! (|2nn9hh and ljewson and Hawking ()2009c[ ) 



We imagine that we have an ensemble of n climate models. Ideally each climate model has been run 
in a large initial condition ensemble, although in practice that may not matter if the impact of initial 
condition uncertainty is small relative to model uncertainty. 

We use a classical statistical framework in which we consider that the models have the same error statistics 
and are samples from an infinite population of all possible models with those error statistics. We also 
assume, perhaps somewhat optimistically, that the mean of this population is equivalent to the true 
future climate (we call this the 'perfect ensemble assumption'). We suppose that the variance of this 
ensemble has been adjusted to compensate for possible dependencies between the models (perhaps using 



ensemoie nas peen aa.iustea to compensate lor possiDie aepenaencies oetween tne models (^pernaps using 
the method of ljewson and HawkinsI ( "2009al )) and hence that, because of this adjustment, we can consider 



the adjusted ensemble members as being independent. 

We write the mean of this ensemble as m and the sample variance as s^, where: 
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The uncertainty around the estimate of the ensemble mean can be estimated by V, where: 

y = - (3) 
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Wc then consider the following two methods for turning this ensemble into a forecast: 
2.1 Use Ensemble Mean, Use Ensemble Variance 

In this method we make a prediction by fitting a normal distribution to the ensemble using maximum 
likelihood. The mean and variance (/Lti,aJ) of the prediction are then given by: 
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Note that we deliberately do not use the well known 'n — 1' expression for the variance, since we will 
need our estimates of the parameters to be maximum likelihood estimates for BIG to apply. 

2.2 Ignore Ensemble Mean, Use Ensemble Variance 

In this method we fit a normal distribution to the ensemble using maximum likelihood, but with the 
assumption that the mean of the distribution is zero. In other words, we ignore the mean of the ensemble, 
but use the variance. 

The mean and variance {^12,(^2) of the prediction are then given by: 
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We do not consider forecasts in which we also ignore the variance of the ensemble, or in which we use the 
mean but ignore the variance, since doing so would be to assume that the predicted mean change had no 
uncertainty at all. That would not correspond to our beliefs, and would also give a BIC score of infinity 
which would be beaten by the other two methods by default. 



3 BIC for each of the models 

In this section we now derive BIC values for the two methods defined in section [2l 
The definition of BIC is: 

B^-2l + klnn (13) 
where I is the likelihood attained at the maximum, k is the number of parameters in the mod e l and n 



is the number of independent data points used to fit the model (see, for example, Wasserman ( 20041 )). 
When comparing a number of models, the model with the lowest value for the BIC is considered the best 
model. 

As the number of parameters increases models will generally fit the data better and I will increase. 
Hence the —21 term will decrease while the klnn term will increase. Whether the BIC itself increases 
or decreases depends on whether the extra parameters offer sufficient additional explanatory power to 
improve predictions, or whether the model becomes overfitted. 

Since we have forced independence between the different climate models by infiating the variance the 
likelihood is given by the density of the multivariate normal distribution for independent samples, which 
is: 

^^^)^n^^-p("i(--A^)^) (14) 

The log-likelihood is then: 
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Using this, we now derive expressions for the BIC for each model. The differences in BIC between our 
two methods arise simply because of the different choices of /i and a given in sections 12.11 and 12.21 above. 

3.1 BIC for Use Ensemble Mean, Use Ensemble Variance 

For this method the log-likclihood is: 

l{x) = -!^ln2n-^lns'-^ (18) 
= -| [In27r + lns2 + n] (19) 
and there are two estimated parameters {k — 2), and so, from equation [T^ the BIC is: 

Bi = n [In 27r + In + n] + 2 In n (20) 

3.2 BIC for Ignore Ensemble Mean, Use Ensemble Variance 

For this method the log-likelihood is: 

lix) = -|ln27r-|ln(.2+m2)-| (21) 
= [ln27r-f ln(s2-f m^)-^^] (22) 



Note that the only difference from the analogous expression for the previous model is the term, which 
is due to the bias in the forecast introduced by ignoring the ensemble mean, and which drives the BIC 
upwards, and tends to make the forecast less favoured. 

This time there is only one estimated parameter (fc = 1), which, on the other hand, tends to makes the 
forecast more favoured, and so the BIC is: 



B2 = n [In 27r + ln(s^ + m^) + n] +\nn 
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4 Comparing BIC values between the two models 

The difference between the BIC scores for our two methods is therefore: 
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For r below the critical value Tc given by the above expression (equation 1311) BIC favours the simpler 
model (ignore the ensemble mean, use the ensemble variance), and for r above this value it favours the 
model complex model (use the ensemble mean, use the ensemble variance). 

As an example, for n = 10 (i.e. an ensemble of size 10) this gi ves rc = ^ — 0.509. Interesti ngly, this is at 
a lower point than the crossover between these models in the iJewson and Hawkind (l2009d) study, which 
was 0.73. This highlights the difference between model selection and monte carlo simulation methods. 
In this case using model selection would lead to using the more complex model for smaller changes in 
predicted climate than using the results of monte-carlo simulations would. 

Figure [1] show the critical values of r versus ensemble size, for a range of different ensemble sizes. 



5 Application to a Prediction of UK Rainfall 

We now apply the ideas developed above to coupled climate model predictions for UK rainfall. Thes e 
predictions are based on multimodel ensembles generated during the CMI P project dMeehl et ah j_ 2007 ) 
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but with the ensemble variance inflated using the method described in Jewson and Hawkind 
using assumed correlations between the models of between and 0.75. 

In figured] we show the value of r for these predictions versus lead time. We also show the critical value 
of r (which is 0.455, since we have 14 models in the ensemble) as a horizontal line. We see that, for a 
correlation of 0.75, the predictions only cross the critical value of r in about 2025. The implication is 
that before 2025 the spread across the models is so great than the ensemble mean is poorly estimated to 
the extent that using it to derive the mean of a forecast gives a less good forecast than ignoring it and 
use a mean change of zero. It is clear, however, that the exact point at which the forecast crosses this 
threshold would be heavily influenced by how the forecasts are smoothed in time. 



6 Summary 



We have considered the question of how to make a prediction of future chmate using output from a 
multimodel ensemble of chmate models. In particular we have considered what to do when the number 
of truly independent climate models is small, and the spread between the model results is large, as is 
often the case for certain variables. Using the standard BIC method for model selection, and making 
the perfect ensemble assumption, we have compared using the ensemble mean and variance with just 
using the ensemble variance and setting the mean change in climate to zero. This allows us to derive 
a simple expression for which of these two predictions to use. We have then applied the expression to 
CMIP predictions for UK rainfall, and concluded that for those predictions the ensemble mean should 
be ignored until at least 2025. 

It is important to note that model selection is not the same as statistical testing. In statistical testing 
one typically asks whether the data supports a certain hypothesis (such as whether there are significant 
changes in UK rainfall), and rejects the hypothesis unless the data is strongly consistent with the hy- 
pothesis. In our case we are accepting the hypothesis that there is a change right from the start, rather 
than testing that hypothesis. We are then trying to decide, at a practical level, whether modelling the 
change is likely to give a better forecast than ignoring the change. If a predicted change is weak and 
uncertain, it can be perfectly logical to believe that there is a change (based on independent arguments 
such as physical reasoning), but that it is impossible to estimate it well enough for it to improve predic- 
tions. That appears to be the case for changes in UK rainfall in the near future. As an aside, it is also 
possible, for slightly stronger and/or less uncertain predictions, that it cannot be proven that a signal is 
statistically significant, but that because you believe that the signal is real (again, presumably based on 
independent reasoning) then it is worthwhile to estimate the signal and include it in predictions. 
There are a number of directions in which we plan to ex t end th is work. One is to try and include 
the 'damped' predictions described in Jewson and Hawkins! (2009c) in the model selection process. This 



cannot be done using BIC, since the damped predictions are not maximum likelihood based. Still, 
there may be other methods for model selection that one could use instead. The damped prediction 
methods will hopefully allow model predictions with lower values of r to become usable, and would give 
more accurate predictions for intermediate values of r. Another direction would be to take an objective 
Bayesian approach to fitting the predictive distribution, in which case the normal becomes a t distribution. 
Once again BIC can no longer be used, although presumably Bayes Factors, from which BIC is derived. 
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Figure 1: Critical values of r for different ensemble sizes (where the models are independent, or where 
the variance across the ensemble has been adjusted to compensate for any assumed dependency). For 
values of r greater than the critical value it is better (according to BIC) to use the ensemble mean than 
ignore it. For values of r less than the critical value it is better (according to BIC) to ignore the ensemble 
mean, and just use the ensemble variance. 
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Figure 2: In blue, values of the prediction-to-uncertainty ratio r, for predictions of UK winter rainfall, 
for four different assumptions about the correlations between the different models (0, 0.25, 0.5 and 0.75). 
In black, the critical value of this ratio r, above which using the ensemble mean gives better predictions 
than ignoring the ensemble mean. 



